The Analysis of Adaptive Data Collection Methods for Machine Learning

نویسنده

  • Kevin Jamieson
چکیده

Over the last decade the machine learning community has watched the size and complexity of datasets grow at an exponential rate, with some describing the phenomenon as big data. There are two main bottlenecks for the performance of machine learning methods: computational resources and the amount of labelled data, often provided by a human expert. Advances in distributed computing and the advent of cloud computing platforms has turned computational resources into a commodity and the price has predictably dropped precipitously. But the human response time has remained constant: the time it will take a human to answer a question tomorrow is the same amount of time it takes today, but tomorrow it will cost more due to rising wages world-wide. This thesis proposes a simple solution: require fewer labels by asking better questions. One way to ask better questions is to make the data collection procedure adaptive so that the question that is asked next depends on all the information gathered up to the current time. Popular examples of adaptive data collection procedures include the 20 questions game or simply the binary search algorithm. We will investigate several examples of adaptive data collection methods and for each we will be interested in answering questions like, how many queries are sufficient for a particular algorithm to achieve a desired prediction error? How many queries must any algorithm necessarily ask to achieve a desired prediction error? What are the fundamental quantities that characterize the difficulty of a particular problem? This thesis focuses on scenarios where the answers to queries are provided by a human. Humans are much more comfortable offering qualitative statements in practice like “this

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Machine Learning Method for Intrusion Detection

Data security is an important area of concern for every computer system owner. An intrusion detection system is a device or software application that monitors a network or systems for malicious activity or policy violations. Already various techniques of artificial intelligence have been used for intrusion detection. The main challenge in this area is the running speed of the available implemen...

متن کامل

Debt Collection Industry: Machine Learning Approach

Businesses are increasingly interested in how big data, artificial intelligence, machine learning, and predictive analytics can be used to increase revenue, lower costs, and improve their business processes. In this paper, we describe how we have developed a data-driven machine learning method to optimize the collection process for a debt collection agency. Precisely speaking, we create a frame...

متن کامل

Evaluating machine learning methods and satellite images to estimate combined climatic indices

The reflections recorded on satellite images have been affected by various environmental factors. In these images, some of these factors are combined with other environmental factors that cannot be distinguished. Therefore, it seems wise to model these environmental phenomena in the form of hybrid indicators. In this regard, satellite imagery and machine learning methods can play a unique role ...

متن کامل

Semantic Preserving Data Reduction using Artificial Immune Systems

Artificial Immune Systems (AIS) can be defined as soft computing systems inspired by immune system of vertebrates. Immune system is an adaptive pattern recognition system. AIS have been used in pattern recognition, machine learning, optimization and clustering. Feature reduction refers to the problem of selecting those input features that are most predictive of a given outcome; a problem encoun...

متن کامل

Investigating the performance of machine learning-based methods in classroom reverberation time estimation using neural networks (Research Article)

Classrooms, as one of the most important educational environments, play a major role in the learning and academic progress of students. reverberation time, as one of the most important acoustic parameters inside rooms, has a significant effect on sound quality. The inefficiency of classical formulas such as Sabin, caused this article to examine the use of machine learning methods as an alternat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015